Three-dimensional convolutional neural network-based classification of chronic kidney disease severity using kidney MRI

A three-dimensional convolutional neural network model was developed to classify the severity of chronic kidney disease (CKD) using magnetic resonance imaging (MRI) Dixon-based T1-weighted in-phase (IP)/opposed-phase (OP)/water-only (WO) imaging. Seventy-three patients with severe renal dysfunction (estimated glomerular filtration rate [eGFR] < 30 mL/min/1.73 m2, CKD stage G4–5); 172 with moderate renal dysfunction (30 ≤ eGFR < 60 mL/min/1.73 m2, CKD stage G3a/b); and 76 with mild renal dysfunction (eGFR ≥ 60 mL/min/1.73 m2, CKD stage G1–2) participated in this study. The model was applied to the right, left, and both kidneys, as well as to each imaging method (T1-weighted IP/OP/WO images). The best performance was obtained when using bilateral kidneys and IP images, with an accuracy of 0.862 ± 0.036. The overall accuracy was better for the bilateral kidney models than for the unilateral kidney models. Our deep learning approach using kidney MRI can be applied to classify patients with CKD based on the severity of kidney disease.


Participants
This study was approved by the Research Ethics Committee of the Saitama Medical University Hospital (approval number 2022-107).All experiments were performed in accordance with relevant guidelines and regulations.The requirement for informed consent was waived by the Research Ethics Committee of Saitama Medical University Hospital.
The enrolled participants partially overlapped with those in our previous work on radiomic analysis and automatic segmentation of kidney MRI, which were not relevant to the present study.Figure 1 summarizes the inclusion and exclusion criteria.We identified and reviewed 423 patients referred from the Department of Nephrology at our hospital who underwent kidney MRI between January 2013 and December 2022.The inclusion criteria included: (1) patients ≥ 15 years and (2) MRI scanning with Dixon-based T1-weighted in-phase (IP)/opposed-phase (OP)/water-only (WO) images in our hospital.The exclusion criteria included: (1) lack of Dixon-based T1WI (n = 35); (2) insufficient clinical or laboratory data (n = 1); (3) high-grade kidney atrophy (difficulty in segmentation) (n = 6); (4) severe artifacts on MRI (n = 31); and (5) presence of renal lesions with maximal diameter > 1 cm or number of renal masses > 5 in each kidney, including polycystic kidney disease (n = 29).In total, 321 patients participated in this study.
Table 1 details the distribution of the study population in each eGFR group.

Data
MRI images were acquired using a 3.0-T superconducting unit (Skyra; Siemens Healthcare, Erlangen, Germany) with a spine coil and an 18-channel phased-array body coil.For all participants in the present analysis, we obtained Dixon-based T1-weighted IP/OP/WO images (only IP/OP/WO images were used in this analysis, as other images, including fat-only images and fat fraction ratio maps, were not generated for all patients).Representative  Table 1.Demographic and clinical characteristics of the study population.Except where otherwise indicated, data are presented as number (%) of patients.se-RD, severe renal dysfunction (eGFR < 30 mL/min/1.73m 2 , CKD stage G4-5); mo-RD, moderate renal dysfunction (30 ≤ eGFR < 60 mL/min/1.73m 2 , CKD stage G3a/3b); mi-RD, mild renal dysfunction (eGFR ≥ 60 mL/min/1.73m 2 , CKD stage G1-2); IgA, immunoglobulin A; SD, standard deviation.CKD, chronic kidney disease.

Image processing and model implementation
We constructed 3D CNN models for the bilateral kidneys, each unilateral kidney (right or left), and each imaging method (T1-weighted IP/OP/WO images).We evaluated and compared their classification performances.An overview of the image data processing scheme is shown in Fig. 2. Image preprocessing was performed using open-source software (3D slicer version 5.0.3).For each kidney MRI, images were cropped to include the right and left kidneys; the resulting image volume had 24 coronal slices with each slice of 128 × 128 px.Bilateral kidney data were obtained by stacking the data of right and left kidneys; hence, 48 slices of 128 × 128 px were obtained.These images were then further resized into 8 slices of 56 × 56 px for unilateral kidney data, and 16 slices of 56 × 56 px for bilateral kidney data.The voxel intensity distribution of each image was normalized by rescaling the intensities into the range of [0, 255].
All image data were subsequently converted into standard binary files (.npy) format and saved together in one directory, along with a CSV file containing clinical information such as eGFR groups.The data were randomly split into 70% for training, 10% for validation, and 20% for testing.We performed five-fold cross-validation, each with 20% of the available data for the test.This enabled the models to be tested on the entire dataset.
The 3D CNN model was trained using Python 3.9 (Python Software Foundation, Beaverton, OR, USA) and the Faimed3D library (https:// kbres sem.github.io/ faime d3d) with Pytorch 1.9.0 (Facebook's AI Research lab)  www.nature.com/scientificreports/backend.Data transformation and augmentation were performed using Faimed3D transformations: randomly flipping the input image along any axis in 3D; randomly rotating the input image by 90° (or 180°and 270°) at an arbitrary angle; randomly cropping the 3D volume; randomly adjusting the brightness and contrast; and randomly generating various artifact imitations with warping, sheering, trapezoid, Gaussian noise, and blur.Our 3D CNN model is based on the 3D ResNet architecture included in the Faimed3D software package.We utilized the default ResNet 3D-18 network, which was pre-trained on the action recognition dataset.Figure 3 the architecture of ResNet 3D-18.The models were trained for 100 iterations on a Windows 10 workstation with a single GeForce RTX 3090 Graphics Processing Unit.Overall, five models were trained (one per fold), with a training time of approximately 10 min per fold.The trained model was subsequently used to perform classification of the test data for approximately 5 s.Finally, the model predictions were averaged and compared with the basic classification metrics, including accuracy, precision, recall/sensitivity, specificity, and f1 score.
The accuracy was calculated using Eq.2: where ŷi is the predicted value of the i-th sample, y i is the corresponding true value, and n samples is the number of samples.The precision, recall/sensitivity, specificity and f1 score were calculated using Eq.3: where TN, TP, FN, and FP are the numbers of true negatives, TP is the number of true positives, FN is the number of false negatives, and FP is the number of false positives.
A receiver operating characteristic (ROC) curve was generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.The TPR is also known as sensitivity, and the FPR is one minus the specificity.ROC-AUC computes the area under the ROC curve.
The macro-averaged and weighted scores were calculated using Eq.4: where N is the number of classes and w i is the number of samples in class i divided by the total number of samples.Furthermore, we calculated the Matthews correlation coefficient (MCC) for multi-class classification using a goodness of fit test.
The MCC was calculated using Eq.5: ( where c is the number of correctly predicted samples, K is the total number of classes, k is the class from 1 to K, s is the number of samples, t k is the number of times class k truly occurs, p k is the number of times class k is predicted.All statistical analyses were performed using the open-source software package (Python sci-kit-learn 0.22.1) 20.Statistical significance was set at P < 0.05.

Results
The findings of the classification models are summarized in Table 2. Figure 4 shows the confusion matrices of all the classification attempts.The best performance was obtained when using an IP image of bilateral kidneys, with an accuracy of 0.862 ± 0.036.Compared with the models for the right or left kidney, those for bilateral kidneys afforded better classification performance.Among the models for bilateral kidneys, the OP images provided inferior results, whereas the IP images provided superior performance.Among the models for the right or left kidney, no differences were noted between the T1WI IP/OP/WO images.

Discussion
In this study, we developed a 3D CNN model to classify CKD severity using T1-weighted IP/OP/WO kidney MRIs.Our results showed that the Faimed3D framework with 3D ResNet architecture successfully classified patients with CKD based on the eGFR grade.The best overall accuracy was observed for the model constructed from the IP image of the bilateral kidneys, whereas inferior results were obtained for the other images and unilateral kidney models.
Machine learning and deep learning-based analyses of kidney MRI have been shown to be feasible for classifying CKD severity [13][14][15][16] .Most studies have examined machine learning methods using MRI-based radiomics analysis to assess the renal function in patients with CKD.Notably, our group previously reported that a radiomics model based on Dixon-based T1WI provided adequate classification accuracy for CKD grades 15 .In the present study, Dixon-based T1WI was investigated in combination with a deep learning approach, and the classification performance seemed to improve compared to previous results, probably owing to the convolutional network used in this study.In a previous radiomics analysis, WO images performed better than IP and OP images.Intriguingly, in this study, IP images performed the best, followed by WO and OP images.
Several aspects of the results are worth discussing.Dixon-based techniques, also called chemical shift imaging, deal with the different phase (IP and OP) cycling of fat and water, and allow the acquired images to be computed into four sequences: IP/OP/WO/fat-only (FO) images 21 .The T1-weighted IP and WO images correspond to "nearly" non-fat-suppressed and fat-suppressed T1WI, respectively, whereas the OP image is an intermediate image that shows micro-fat with low signal intensity.In this regard, our 3D CNN model exhibited the best performance with IP images, which may indicate that non-fat-suppressed T1WI are more suited to our model than fat-suppressed images.Notably, a recent study on T1 mapping showed that increased cortical T1 values and decreased T1 corticomedullary differentiation were associated with the severity of renal impairment 22,23 .Changes in T1 values may be ascribed to renal physiological conditions such as hypoxia, fibrosis, and inflammation 22,[24][25][26][27] .
Our hypothesis is that these changes in T1 values are represented on Dixon-based T1WIs, especially IP images, and may provide clues for CNN to classify the CKD grade.Additionally, Dixon-based images have been used for homogeneous fat suppression or fat quantification and provide better signal-to-noise efficiency than other conventional fat-suppression methods 28 .In diabetic nephropathy, lipid deposition in the kidneys can be measured using Dixon-based techniques 29 .In recent studies, Dixon-based fat quantification has shown the potential for discriminating the severity of CKD.Notably, it has been suggested that lipid accumulation occurs in the renal parenchyma, especially in the cortex, and that this accumulation increases in proportion with the CKD grade 12 .Furthermore, in Dixon-based imaging, doubleecho sequences have been reported to detect iron deposition and magnetic susceptibility artifacts associated with T2* effects 30 .In this context, the renal signal changes caused by lipid or iron accumulation may have helped our 3D CNN model classify CKD grades.This hypothesis can be further verified if quantitative images, such as fat fraction maps, are introduced into our 3D CNN models.
In a study on automated kidney segmentation of Dixon-based T1WI, the model created using IP images showed the highest segmentation accuracy, followed by those created using WO and OP images 31 .Among Dixonbased T1WI, IP images may have favorable image contrast for the CNN to identify the true renal parenchyma compared with other images 31 .This observation stems from the idea that the location of the kidney relative to the adjacent liver and spleen, and the contrast between the kidney and these organs, could provide clues for CNN to identify the renal parenchyma.Notably, signal contrast with the surrounding adipose tissue may contribute to kidney identification.Moreover, the internal contrast of the kidney and the contrast difference between the kidneys could help identify features for classifying the grade of renal dysfunction.To test the above hypothesis, class activation mapping, which shows where the CNN is focused, could be useful.Comparison of the performance of the CNN model created with the images used in this study and kidney-only images with the surrounding tissue removed using a mask may also be useful for determining the influence of the surrounding tissue.This aspect will be investigated in future studies.
Our study showed that the classification performance was better for the bilateral kidney models than for the unilateral kidney models.Previous studies on MRI-based CKD classification using machine learning or deep learning have not considered whether one or both sides of the kidney should be evaluated.Most studies tend to consider only one side of the kidney for several reasons.Severe artifacts tend to occur on one side (especially the left side) 13 , and for radiomics analysis, evaluation of only one side can reduce the time-consuming process of region-of-interest delineation 15 .However, as the present study suggests, the imaging data of bilateral kidneys might contain more integrated and beneficial information for renal function than the data of unilateral kidneys.
This pilot study investigated the applicability of Faimed3D-based CNN models for classifying CKD severity using kidney MRI.Faimed3D is a recently released open-source library that allows the implementation of 3D CNN models on radiological data 18 .Although the model is not beyond the state-of-the-art, Faimed3D emphasizes usability and speed 18 .3D CNNs have disadvantages in terms of computational cost and time required; however, they have been successfully applied in several recent studies.In the Faimed3D framework, GPU acceleration and a faster callback mechanism can be used to accelerate training and validation with less code and yield better precision 18 .Therefore, we completed the training and validation processes in approximately 10 min per fold.Most medical images consist of 3D volumetric data; therefore, 3D CNNs are favorable because they perform convolution operations in three directions.In 3D networks, the image volume is divided into smaller cubes to allow different input shapes, thus reducing memory requirements 32 .In other aspects, the 3D CNN facilitates data integration, such as when evaluating bilateral kidneys, as in the present study.
Although machine learning-aided radiomics analysis has been well studied, deep convolutional networks using medical images have not been fully tested in the assessment of CKD status.CNN-based studies on CKD diagnosis and prognosis have primarily focused on clinical and serological datasets [33][34][35] .This may be because kidney MRI is not a routine examination for CKD in a clinical setting.Most of these studies were non-imagebased, and the overall performance of the models was excellent, with accuracy scores > 90% 33,34,36 .All these non-image-based studies achieved good accuracy with binary classification, making exact comparisons between their studies and ours difficult.Considering the good performance of the CNN models based on clinical and laboratory data, it may be possible to develop more sophisticated models by integrating images with clinical and laboratory data in the future.This is a preliminary study to evaluate the performance of a 3D CNN model that simply classifies CKD grades using kidney MRI.However, our future goal is to develop alternative imaging-based biomarkers that cannot be identified using existing methods.In this context, a recent study using an MRI-based CNN to predict the eGFR decline over time in patients with autosomal dominant polycystic kidney disease is intriguing 37 .Therefore, image-based models (and non-image-based models) for renal function assessment should be integrated to generate a more comprehensive and meaningful model for predicting CKD progression and eGFR decline.Further research is required to confirm the validity and generalizability of these models.
This study has several limitations.First, this retrospective study enrolled 321 patients from a single institution; this was a small sample size for a deep learning-based study with some imbalance between each CKD group.Future studies should examine a larger number of patients with a more balanced grouping.Second, because we excluded patients with renal lesions, some important renal diseases, such as polycystic kidney disease, were excluded from the analysis.Third, we could not analyze other Dixon-based images such as fat-only images and fat fraction ratio maps because they were not available for all patients.Fourth, other T1-or T2-weighted images were not available for use in this study because they were not routinely scanned or scanned in different planes in routine sequences.
In conclusion, a 3D CNN model was developed to classify CKD severity using T1-weighted IP/OP/WO MRIs.The Faimed3D framework with the 3D ResNet architecture can be successfully applied to classify patients with CKD according to disease severity.The overall accuracy was better for the bilateral kidney models than for the unilateral kidney models.The best performance was observed for the model created with an IP image of bilateral kidneys, whereas inferior results were obtained for other images and unilateral kidney models.As our preliminary 3D CNN model can be extended to be more comprehensive and meaningful, further validation of these results is required in the future.

Figure 1 .
Figure 1.Flow chart of the inclusion and exclusion criteria for the study.

Figure 2 .
Figure 2.An overview of the image data processing used in this study.(A) On each Dixon-based T1-weighted kidney MRI, images are cropped to include the right and left kidneys; the resulting image volume has 24 coronal slices with each slice as 128 × 128 px.(B) These images are then further resized into 8 slices of 56 × 56 px for unilateral kidney datasets.Bilateral kidney data are obtained by stacking the data of the right and left kidneys, hence 16 slices of 56 × 56 px.Therefore, a total of 9 datasets are created for three-dimensional (3D) convolutional neural network (CNN) models derived (1) from each unilateral kidney (right or left kidney) and for bilateral kidneys, and (2) from each imaging method (T1-weighted in-phase (IP)/opposed-phase (OP)/water-only (WO) images), respectively.The 3D residual network-18 (3D ResNet-18)-based classification is performed on each dataset, classifying the three severity groups of chronic kidney disease (CKD).

2 )Figure 3 .
Figure3.The architecture of our three-dimensional residual network-18 model.Input is processed volumetric data of kidney magnetic resonance imaging.The network contains the initial convolutional layer, followed by 8 residual units (two with filters = 64, two with filters = 128, two with filters = 256, and two with filters = 512), each with two convolutional blocks as shown in the bottom row.The last layer is a fully connected dense layer that outputs a classification of three groups.

2 k Table 2 .
Performance of three-dimensional convolutional neural network-based classification of the three groups of chronic kidney disease.Data are presented as means ± standard deviation.IP, in-phase; OP, opposedphase; WO, water-only.MCC, Matthews' correlation coefficient.